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Abstract 



Compressed sensing with sparse frame representations is seen to have much greater 
range of practical applications than that with orthonormal bases. In such settings, 
one approach to recover the signal is known as ^i-analysis. We expand in this article 
the performance analysis of this approach by providing a weaker recovery condition than 
existing results in the literature. Our analysis is also broadly based on general frames and 
alternative dual frames (as analysis operators). As one application to such a general-dual- 
based approach and performance analysis, an optimal-dual-based technique is proposed 
to demonstrate the effectiveness of using alternative dual frames as analysis operators. 

CO; 

An iterative algorithm is outlined for solving the optimal-dual-based ^i-analysis problem. 
The effectiveness of the proposed method and algorithm is demonstrated through several 
experiments. 
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1 Introduction 

Compressed sensing concerns the problem of recovering a high-dimensional sparse signal from a 
small number of linear measurements 

y = *f + z, (1) 

where $ is an m x n sensing matrix with m Cn and z G W 11 is a noise term modeling measurement 
error. The goal is to reconstruct the unknown signal f G W 1 based on available measurements 
y € W 11 . References on compressed sensing have a long list, including, e.g., [T2 | [T3l [T^l \18 \ [T9]. 

In standard compressed sensing scenarios, it is usually assumed that f has a sparse (or nearly 
sparse) representation in an orthonormal basis. However, a growing number of applications in signal 
processing point to problems where f is sparse with respect to an overcomplete dictionary or a frame 
rather than an orthonormal basis, see, e.g., [29], [16], [5], and references therein. Examples include, 
e.g., signal modeling in array signal processing (oversampled array steering matrix), reflected radar 
and sonar signals (Gabor frames), and images with curves (curvelets), etc. The flexibility of frames 
is the key characteristic that empowers frames to become a natural and concise signal representation 
tool. Compressed sensing, with works including, e.g., [33], [15], that deals with sparse representations 
with respect to frames becomes therefore particularly important. In this setting the signal f is 
expressed as f = Dx where D G M. nxd (n < d) is a matrix of frame vectors (as columns) that 
are often rather coherent in applications, and x G M. d is a sparse coefficient vector. The linear 
measurements of f then become 

y = *Dx + z. (2) 

Since x is assumed sparse, a straightforward way of recovering f from ([2]) is known as ^i-synthesis 
(or synthesis-based method) [16], [21], [15]. One first finds the sparsest possible coefficient x by 
solving an t\ minimization problem 

x = argmin||x||i s.t. ||y — 3>Dx||2 < e, (3) 
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where ||x|| p (p = 1, 2) denotes the standard £ p -norm of the vector x and e 2 is a likely upper bound 
on the noise power ||z|||. Then the solution to f is derived via a synthesis operation, i.e., f = Die. 

Although empirical studies show that £i-synthesis often achieves good recovery results, little is 
known about the theoretical performance of this method. The analytical results in |33| essentially 
require that the frame D has columns that are extremely uncorrelated such that $D satisfies the 
requirements imposed by the traditional compressed sensing assumptions. However, these require- 
ments are often infeasible when D is highly coherent. For example, consider a simple case in which 
3> 6 R mx " is a Gaussian matrix with i.i.d. entries, then ~ A/"(0,I n <g> I m ), where <g> denotes the 
Kronecker product and I m is an identity matrix of the size m. It is now well known that with very 
high probability $ has small s-restricted isometry constant when m is on the order of slog(n/s) 
[12] , p. Let us now examine <&D. It is not hard to show that 3>D ~ M (0, D*D ® I m ), where (•)* 
denotes the transpose operation. Consequently, if D is a coherent frame, 3>D does not generally 
satisfy the common restricted isometry property (RIP) [33J. Meantime, the mutual incoherence 
property (MIP) [19] may not apply either, as it is very hard for <&D to satisfy the MIP as well when 
D is highly correlated. 

The analysis-based method, ^i-analysis, is an alternative to ^i-synthesis, e.g., [20], [21], |15j . 
which finds the estimate f directly by solving the problem 

f = argmin||D*f ||i s.t. ||y— <&f||2 < e. (4) 

fgR" 

When D is a basis, the £i-analysis and the £i-synthesis approaches are equivalent. However, when 
D is an overcomplete frame, it was observed that there is a recovery performance gap between them 
[IB], [5T]. No clear conclusion has been reached as to which approach is better without specifying 
applications and associated data sets. 

A performance study of the £i-analysis approach is just recently given in [15]. It was shown that 
([4]) recovers a signal f with an error bound 



provided that <& obeys a D-RIP condition (see (|12p ) with 62s < 0.08, where the columns of D form 
a Parseval frame and (D*f) s is a vector consisting of the largest s entries of D*f in magnitude. 
It follows from © that if D*f has rapidly decreasing coefficients, then the solution to is very 
accurate. In particular, if the measurements of f are noiseless and D*f is exactly s-sparse, then f is 
recovered exactly. 

Indeed, ^-analysis shows a promising performance in applications where both the columns of 
the Gram matrix D*D and the coefficient vector x are reasonably sparse, see e.g., |21| . [TJ, |15] , In 
other words, as long as the frame coefficient vector D*f is sensibly sparse, ^-analysis can be the 
right method to use. 

However, the ^-analysis approach of (J2J) is certainly not flawless. That f is sparse in terms of 
D does not imply D*f is necessarily sparse. In fact, as the canonical dual frame expansion in the 
case of Parseval frames, D*f = D*Dx has the minimum £2 norm by the frame property, see, e.g., 
[TT| and is usually fully populated which is also pointed out in [33] . 

For a given signal f , there are infinitely many ways to represent f by the columns of D. By the 
spirit of frame expansions, all coefficients of a frame expansion of f in D should correspond to some 
dual frame of D. It is not hard to imagine that there should be some dual frame of D, denoted by 
D, such that D*f is sparser than D*f. Furthermore, if a similar error bound (just like ([5])) holds 
for arbitrary dual frame analysis operators, then one may expect a better recovery performance by 
taking some "proper" dual frame of D as the analysis operator. Motivated by this observation, we 
consider a general-dual-based i\-analysis as follows: 

f = argmin||D*f||i s.t. ||y — <l?f || 2 < e, (6) 

where columns of the analysis operator D form a general (and any) dual frame of D. 

In this article, we first present a performance analysis for the general-dual-based ^i-analysis 
approach ©. It turns out that a recovery error bound exists entirely similar to that of ©. More 
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precisely, under suitable conditions, ^ recovers a signal f with an error bound 

l|f-f|b<C„. e + Cl . » 6,f -<P ,f )'''' . (7) 

Vs 

We show that sufficient conditions which ratify a recovery performance estimation (fTJ) depend not 
only on the D-RIP of <&, but also on the ratio of frame bounds. By utilizing the Shifting Inequality 
[9], the recovery condition on the sensing matrix is improved from 82s < 0.08 [15] to 62s < 0.2 under 
the same assumptions that columns of D form a Parseval frame and D = D. 

The important question then is how to choose some appropriate dual frame D such that D*f 
is as sparse as possible. One approach as we propose here is by the method of optimal-dual-based 
li-analysis: 

f= argmin ||D*f||i s.t. ||y — <&f H2 < e, (8) 

DD*=I, f€K™ 

where the optimization is not only over the signal space but also over all dual frames of D. Note 
that the class of all dual frames for D is given by [28J (see (|17p ) 

D = (DD*)"^ + W*(I d - D*(DD*) _1 D) = D + W*P, (9) 

where D = (DD*) _1 D denotes the canonical dual frame of D, P = I d — D*(DD*) X D is the 
orthogonal projection onto the null space of D, and W E M. dxn is an arbitrary matrix. Plug ([9]) into 
([8]), we obtain 

f= argmin ||D*f + Pg||i s.t. ||y — <&f H2 < e, (10) 

fGM", gGK d 

where we have used the fact that when f 7^ 0, g = Wf can be any vector in M. d due to the fact that 
W is free. 

Clearly, the solution to (|10p definitely corresponds to that of © with some optimal dual frame, 
say D Q as the analysis operator. The optimality here is in the sense that ||D*f||i achieves the 
smallest ||D*f ||i in value among all dual frames D of D and feasible signals f satisfied the constraint 
in ()10p . When f is sparse with respect to D, it is highly desirable that the corresponding optimal 
dual frame should be effective in sparsfying the true signal f. It then follows from ([7]) that an 
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accurate recovery of f may be achieved by the solution of (|lUp . Indeed, we have seen that the signal 
recovery via (jlOp is much more effective than that of the ^i-analysis approach @ which uses the 
canonical dual frame as the analysis operator. 

Finally, we also develop an iterative algorithm for solving the optimal-dual-based i'l-analysis 
problem. The proposed algorithm is based on the split Bregman iteration introduced in [23J. Our 
numerical results show that the proposed algorithm is very fast when properly chosen parameter 
values are used. 

This paper is organized as follows. Section [2] contains preliminary discussions about compressed 
sensing with general frames. Performance studies for the general-dual-based ^-analysis approach are 
presented in section In section [H an optimal-dual-based £i-analysis approach and a corresponding 
iterative algorithm are discussed. In section [5l results of numerical experiments are presented 
to illustrate the effectiveness of signal recovery via the optimal-dual-based ^i-analysis approach. 
Conclusion remarks are given in section [6l Included in the appendix is on the basics of the Bregman 
iteration which is beneficial to the discussion of the algorithm presented in section [H 

2 Preliminaries 

2.1 Preliminaries for Compressed Sensing 

Let x £ M. d be a column vector. The support of x is defined as supp(x) = {i : Xj ^ 0, % = 1, . . . , d}. 
For s £ N, a vector x is said to be s-sparse if |supp(x)| < s. For T C {1, . . . ,d}, xy stands for a 
|T|-long vector taking entries from x indexed by T. Similarly, Dy is the submatrix of D restricted 
to the columns indexed by T. We shall write = (Dy)*, and use the standard notation ||x|| ? to 
denote the £ g -norm of x 

(U=i\*i\ q ) 1/q 1<9<oo, 

ll x ll<? = < 

max |xj| q = oo. 

* l<i<n 

For an m x n measurement matrix we say that $ obeys the restricted isometry property [10] 
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with constant 7 S G (0, 1) if 

(i- 7s )l|x|||< ||*x||!<(i + 7s )||x||! (ii) 

holds for all s-sparse signals x. We say that $ satisfies the restricted isometry property adapted to 
D (abbreviated D-RIP) [15] with constant 5 S G (0, 1) if 

(l-^)||v||l < ||*v||l < (1 + S s )\\vf 2 (12) 

holds for all v G E s , where £ s is the union of all subspaces spanned by all subsets of s columns of 
D. Obviously, S s is the image under D for all s-sparse vectors. Similar to 7 S , it is easy to see 5 S is 
monotone, i.e., 5 S < 5 Sl , if s < s± < d. 

The D-RIP condition is also validated in a number of discussions. For instance, it was shown in 
|15j that suppose an m x n matrix 3> obeys a concentration inequality of the type 

Pr - H|| > S\\u\\l) < ce-^ 2 " 1 , 5 G (0, 1) (13) 

for any fixed v G M n , where 7, c are some positive constants, then $ will satisfy the D-RIP 
(associated with some D-RIP constant) with overwhelming probability provided that m is on the 
order of slog(d/s). Many types of random matrices satisfy (|13p . some examples include matrices 
with Gaussian, subgaussian, or Bernoulli entries. Very recently, it has also been shown in [27] that 
randomizing the column signs of any matrix that satisfies the standard RIP results in a matrix 
which satisfies the Johnson-Lindenstrauss lemma [26J. Such a matrix would then satisfy the D-RIP 
via (|13j) . Consequently, partial Fourier matrix (or partial circulant matrix) with randomized column 
signs will satisfy the D-RIP since these matrices are known to satisfy the RIP. 

2.2 Preliminaries for Frame Theory 

A set of vectors {d^j^g/ in W n is a frame of W 1 if there exist constants < A < B < 00 such that 

VfGK n , A\\f\\l<Y,\{*^k)? <B\\nl (14) 
kei 
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where numbers A and B are called frame bounds. A frame that is not a basis is said to be overcom- 
plete or redundant. More details about frames can be found in e.g., [17], [23], [25]. In the matrix 
form, (|14p can be reformulated as 

VfeR", A\\f\\l < f*(DD*)f < B||f (15) 

where {d k }k£i are the columns of D. When A = B = 1, the columns of D form a Parseval frame 
and DD* = I. A frame {d^j^g/ is an alternative dual frame of {di-}kel if 

VfeK", f = £(f , d fc ) d k = £(f , d k ) d fe . (16) 

fcei" fee/ 

For every given overcomplete frame {dfc}fc e /, there are infinite many dual frames {di-}kel such that 
(|16p holds [28] . More precisely, the class of all dual frames for D is given by the columns of D 

D = (DD*)^D + W*(I d - D*(DD*) _1 D) = (DD*) ^ + W*P. (17) 

Note that DD* = I. When W = 0, D reduces to the canonical dual frame D = (DD*) _1 D. The 
lower and upper frame bound of D is given by 1/B and 1/A, respectively. For f £ R n , the canonical 
coefficients D*f have the minimum £2 norm, i.e., ||D*f |U = min 1 1 x 1 1 2 - 

x:Dx=f 

2.3 The Shifting Inequality 

We now briefly discuss the Shifting Inequality [9], which is a very useful tool performing finer 
estimation of quantities involving t\ and £2 norms. A different proof of this inequality is also given 
in [221. 



Lemma 1. (Shifting Inequality [9] ) Let q, r be positive integers satisfying q < 3r. Then any 
nonincreasing sequence of real numbers a\ > ■ ■ ■ > a r > bi > ■ ■ ■ > b g > c\ > ■ ■ ■ > c r > satisfies 



W + r c j<^l^. (18) 
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3 Sufficient Conditions for General-dual-based ^-analysis 

In this section, we establish theoretical results for the general-dual-based ^-analysis approach (|6|) 
in which the analysis operator can be any dual frame of D. Our main result is that, under suitable 
conditions, the solution to ([6]) is very accurate provided that D*f has rapidly decreasing coefficients. 
We present two results with slightly different emphasises. They are, respectively, when the analysis 
operator is an alternative dual frame and when the analysis operator is the canonical dual frame. 

3.1 The Case of Alternative Dual Frames 

Theorem 1. Let D be a general frame ofW 1 with frame bounds < A < B < oo. Let D be an 
alternative dual frame of D with frame bounds < A < B < oo, and let p = s/b. Suppose 



holds for some positive integers a and b satisfying < b — a < 3a. Then the solution f to © satisfies 



where Co and C\ are some constants and (D*f) s denotes the vector consisting the largest s entries 
of D*f in magnitude. 

Proof. The proof is inspired by that of [11] . Let f and f be as in the theorem. Set h = f — f . Our 
goal is to bound the norm of h. Without loss of generality, we assume that the first s entries of D*f 
are the largest in magnitude. Making rearrangement if necessary, we may also assume that 



where (D*h)(/c) denotes the kth. component of D*h. Let To = {1, 2, ... , s}. In order to apply the 
Shifting Inequality, we partition T§ (complement set of To) into the following sets: T\ = {s + 1, s + 
2, . . . , s + a} and Tj = {s + a + (i — 2)6 + 1, ■ ■ ■ , s + a + (i — 1)6}, i = 2, 3, . . ., with the last subset of 




(19) 



f-f|| 2 <Co-e + Ci- 



D*f - (D*f%||i 



(20) 



|(D*h)( S + l)|>|(D*h)( S + 2)|>... 
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size less than or equal to b, where a and b are positive integers satisfying < b — a < 3a. Further 
divide each 2$, i > 2 into two pieces. Set 

T a = {s + a + (i - 2)b + 1, • ■ ■ , a + (i - 1)6}, 



and 



T i2 = Ti\Tn = {s + (« - l)b + 1, ■ ■ ■ , s + (i - 1)6 + a}. 



Note that |Tji| = b — a and | 1 = a for all i > 2. For simplicity, we denote Toi = To U Ti. Note 
first that 



|h|| 2 = ||DD*h||2 = ||D 2bl D5 bl h + D 2S . 1 D5 S :h||2 



< ||D roi D^ oi h|| 2 + HD^D^hUs 

< ||D Toi D£ oi h|| 2 + v^||D%h|| 2 , 



(21) 



where = (Dr)*. To bound the norm of h, it is required to bound HD^hl^ and ||Dx 01 Dy oi h|| 2 . 
Then the proof proceeds in following three steps: 

Step 1: Bound the tail ||Di, c h|| 2 . Since f and f are feasible and f is the minimizer, we have 

" -'oi 



D^ f ||x + ||D^cf 



T cI IU 



|D*f||i > ||D*f||i = ||D*f - D*h||i 



D^ f - D5hh||i + ||D^cf - D^ f h 



'T 



T r ? n IU 



> HD^f ||i - ||D£ n h||i + ||D^ch||i - ||D^ c f||i 



'To 



This implies 



Dtch||! < ||Dih|| 1 + 2||D5*f||i. 



If < b-a < 3a, then applying the Shifting Inequality ([18]) to the vectors (D^h)*, (D£ h)*, (D^h) 



(22) 



and 



( D T {i _ 1)2 h)* s (D^h)*,(D* i2 h)* 



|Dr 2 h|| 2 < 



|DT,h|| 2 < 



for i = 3, 4, . . . , we have 
||D^h||i + ||D$, 2i h||i 

Vb _ '"' 

Dt^hHi + HD^hll! 
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Vb 



It then follows that 



ID* h 



2 



< 



Tell I 

± Q 



i>2 



Vb 



m llDthlli 2 D* f i 

Vb Vb 
c.s. [7 ~ 2||D^ c f||i 



6" To ^ ' Vb 

= vp(\m h h+v) 

< vpi^BW^h + r] ), 



where p = s/b, rj = 2||Dycf \\x/\/s, and C.S. stands for the Cauchy-Schwarz inequality. Hence, 
||D*^ C hllo is bounded by 

\\n* T§ h\\ 2 < ^2 l|D^h|| 2 < VP (v/IllhUa + r?) . (23) 

i>2 

Step 2: Show ||Dr 01 Dy h|| 2 is appropriately small. On the one hand, 

||*h|| 2 = ||*f - y - (*f - y)||a < ||*f - y||a + ||*f - y|h < 2e. (24) 
On the other hand, 

||*h|| 2 = ||*DD*h|| 2 = || *D Toi D*, oi h + *D ToCi Dahlia 

> ||*D Toi D^ 01 h|| 2 - ^ ||*D Ti D^h|| 2 

i>2 

C2J . ~ , * 

> Vl-^+a||D Toi D^ oi h|| 2 - ^/T+hY. H D ^D^h|| 2 

i>2 

> sfl - J s+o ||D T01 D5, 01 h|| 2 - y/1 + 5 b ||D r< || 2 ||D^h|| 2 

i>2 

> y/1 - <WI|D Toi D^ 01 h|| 2 - V(l + S b )Bj2 lfth|| 2 

> y/1 - J s+o ||D T01 D^hlb - sfpjl + 5 b )B (V5||h|| 2 + rfj . (25) 



Combining (JMD and ([25]) yields 



V / r^^||D Toi D^ 0i h|| 2 < 2e + VpJl + SbjB (V B\\h\\ 2 + r?J . (26) 

11 



Step 3: Bound the error of h. It follows from ([21]) and (|23|) . 

||h|| 2 < ||D 2il D^ J1 h[| 2 + -/B||D5 S!i h[| 2 

< ||D Tol D^ 01 h|| 2 + \] pBB\\h\\ 2 + ^B- 7]. (27) 

Combining with ([27]) yields 

#l||h|| 2 < 2e + K 2V , (28) 

where 



Ki = yjl - 5 s+a - J P BB{1 - 5 s+a ) - J P BB(1 + S b ), 



K 2 = y/ P B(l-5 a+a ) + y/pB(l+6 b ). 

If K\ is positive, then we have 



Ma < ' + *., = Cb .« + c J|M-(pOA , (29) 

ivi -ft-i V s 



where Co = 2/ifi and C\ = 2K 2 /K\. At last, note that if 



1-\J P B~b\ ■ 5 s+a + pBB ■ 5b < 1 - 2\J pBB, (30) 

then K\ > 0. This completes the proof. □ 

Remark 1: The D-RIP condition can now be <5 2s < 0.1398 in the case of Parseval frames. Suppose 
D is a Parseval frame and the analysis operator D is its canonical dual frame, i.e., D = D as seen 
in [15] . Then (fT9[) becomes, since BB = 1, 

(1 - ■ 5 s+a + p ■ 5 b < 1 - 2y/p. (31) 

Note that different choices of a and b may lead to different conditions. For example, let a = 3s, b = 
12s, and p = s/b= 1/12. Then becomes 

(13 - 4\/3) • 5 4s + 5 12s < 12 - 4^3. (32) 
12 



By the fact that 5k s < k ■ 62s for positive integers k and s (Corollary 3.4 of [31J), (f3"2"j) is satisfied 
whenever 62s < (3— \/3)/(16— 4v3) ~ 0.1398. This condition is weaker than the condition 62s < 0.08 
obtained in [15]. 

Remark 2: When D is a general frame and the analysis operator D is its canonical dual frame, 
i.e., D = (DD*) _1 D, then (fl~9j) may be expressed as 



where k = BB = B/A is the ratio of the frame bounds. We see that this sufficient condition not only 
depends on the D-RIP constants of but also on the ratio of frame bounds k = B/A. Furthermore, 
as k increases, it will lead to a stronger condition on <&. For instance, let a = 7s, b = 8s, and p = 1/8, 
for different k's, e.g., k = 1 and k = a/2. (|55|t becomes 5% s < 0.5395 and 5g s < 0.3104, respectively. 
The former is obviously much weaker than the latter. Hence, from this point of view, whenever a 
Parseval frame is allowed in specific applications, it makes sense to use the Parseval frame (k = 1). 
Remark 3: In general, when D is a general frame and D is an alternative dual frame of D, we see 
that the product of the upper frame bounds BB (of D and D) is a factor in the sufficient condition. 
Evidently, BB is similar to k in the case of the canonical dual. A larger BB will lead to a stronger 
condition on <&. 

Remark 4: The results obtained in Theorem[T]for bounded noise can be applied directly to Gaussian 
noise, i.e., z ~ M (0, a 2 I m ), because in this case z belongs to a bounded set with large probability, 
as the following lemma asserted. 

Lemma 2. [8] The Gaussian error z ~ A/"(0, <r 2 I m ) satisfies 



A combination of Theorem [T] and Lemma [2] leads to the following result for the Gaussian noise 
case. 



(1 - \/pfi;) 2 • 5 s+a + ■ 5 b < 1 - 2y/pH. 



(33) 




(34) 
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Corollary 2. (Gaussian Noise Case) Let D be a general frame ofM n with frame bounds < A < 
B < oo. Let D be an alternative dual frame of D with frame bounds < A < B < oo, and let 
p = s/b. Suppose 

[l-^BB^j ■ 5 s+a + pBB ■ 5 b < 1 - 2\[p~BB Z (35) 

holds for some positive integers a and b satisfying < b — a < 3a. Then with probability at least 
1 — (1/m), the solution f to ([6]) with e = o\Jm + 1\Jm\o i gm satisfies 

||f-f|| 2 < C -a^m + 2ymlogm + Ci . H D f ~ Qili , (36) 

where Co and C\ are some constants and (D*f) s denotes the vector consisting the largest s entries 
of D*f in magnitude. 

3.2 An Improvement in the Case of the Canonical Dual Frame 

We also notice that when using the explicit matrix structure of the canonical dual D = D = 
(DD*) _1 D, the sufficient condition can be further improved. It seems to us that such an improve- 
ment can not easily carry through to the general dual frame case. 

Theorem 3. Let D be a general frame of 1" with frame bound < A < B < oo and D be the 
canonical dual frame o/D. Let k = B/A and p = s/b such that p < 1/k. Suppose 

(1 - pn) 2 ■ 8 s+a + pn 3 ■ S b < (1 - P k) 2 - P k 3 (37) 

holds for some positive integers a and b satisfying < b — a < 3a. Then the solution f to © (with 
the canonical dual frame as the analysis operator) satisfies 

\\f-f\\ 2 <C .e + C 1 . llt) * i -^ iU \ (38) 

Vs 

where Co and C\ are some constants and (D*f) s denotes the vector consisting the largest s entries 
of D*f in magnitude. 
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Proof. In this case, (|23|) and (|26|) respectively become 

||D^h|| 2 < g ||D^h|| 2 < ^ (^H h ll2 + *?) 



and 



We have 



v/l-5 s+a ||D Toi D^h^ < 2e + y/p(l + 5 b )B (^IN 2 + fj) ■ 



_ _ 

|h||| = ||DD*h||§ < S||D*h||l = B||D£ 01 h|| 2 , + B||D^eh|| 2 , 



= B((DD*)- 1 h,D 2bl D5 bl h) +B\\T>* TS hg 

c s 

< SIKDD^^hlbllD^D^hlh + SHD^hlli 
@ f l x 2 

< 2 l|h|l2l|DToiD ^ h|12 + Bp ( 7l l|h|12 + 



5 „ 5p M „ 9 2Bp„ „ „ 

-||h|| 2 ||D roi D^ 01 h|| 2 + -^||h|| 2 + ^=l|h|| 2 • V + Bpr] 2 . 



2 2 

Applying the fact that mjj < + |- for any value u, v and c > twice to we have 
,1,1,2 . fl/ gJNl , HDtoxDtq^II^ , Bp 2 2Bp 2 

l|h|| 2 < ^-3- + ^ j+ ir \\h\\ 2 + - 7I \\h\\ 2 . v + Bp n 

S/ Cl ||h||| ||D Tol D* oi h|| 2 \ Bp 2 2Bp/c 2 ||h||l r/ 2 . 2 



" A I 2 2 C1 / A v/Z V 2 2c2 ' 



where ci, c 2 > 0. Let k = B/A and simplifying the above equation yields 



1-^Y~PK- c 2 p^B) ||h|| 2 < ^-||D Toi D^ oi h|| 2 + (pV^B/c 2 + pB) V 2 . 



Using the fact that \fv? + v 2 < u + v for u,v > 0, we obtain 



IN 2 y (l - ^ - P K - c 2 PV^b) < ^^-||D Toi D^ 01 h|| 2 + ^ (p^/c 2 + P b). 
Here we have assumed that 



ClK I — — 

1 pK — c 2 pv kB > 0. 



Combining (00]) with (J32J) yields 

^i||h|| 2 < 2e + K 2 rj, 
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where 



2ci . / C\K 



Ki = \j— (1 - 8s+a) [1 ^ ~ c 2 pVkB) - \J pn(l + 5 b ), 



K 2 = \l — (1 - S a+a ) ( p^B/c 2 + P B) + y/pB(l + 5, 



K 

If K\ is positive, then we have 

Ws < » . e+ *., =a ,. e+Ci .«5L^e*, ( 45, 

where Co = and C\ = 2K2/K1. We now consider how to properly choose the parameters 

c\,C2 > such that i^i is positive and ([13D holds. Let 5(01,02) = 2ci(l — ^—pK—C2pV^B), C\,C2 > 
0. Note first that g (01,02) decreases as 02 increases. Thus we can take 02 arbitrarily small, i.e., 
02 — > 0+, then g(c\,c 2 ) reduces to g(c\) = 2c\(l — ^ — pn). Further, g(c\) achieves its maximum 
at c° pt = (1 — pk)/k. Hence, we choose o\ = c° pt and K\ > is guaranteed provided that 

(1 - P k) 2 ■ 5 s+a + P k 3 ■ 5 b < (1 - P k) 2 - P K 3 . (46) 

To guarantee c\ > and (}4"3]) holds, it is also required that 

p < -. (47) 

This completes the proof. 

□ 

Remark 5: The D-RIP condition can now be <!>2<s < 0.2 in the case of Parseval frames. Suppose D 
is a Parseval frame and the analysis operator D is its canonical dual frame, i.e., D = D. Then (I37p 
becomes, since k = 1, 

(1 - pf ■ 5 s+a + p-5 b <(l-p) 2 -p. (48) 

Again, different choices of a and b will lead to different conditions. For instance, let a = s, b = 4s, 
and p = s/b = 1/4 < 1. Then (0SJ) becomes 

9<5 2s + 4S 4s < 5. (49) 
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which is satisfied whenever 62s < 0.2. Note also that smaller 5^ will lead to smaller constants in the 
error bound. For example, let C2 = 1/10 and c\ = 1 — p — cip = 29/40, then we have Co = 29.1 and 
C\ = 66.5 whenever <54 S < 1/4. If 64s has a tighter restriction, i.e., 54 S < 1/8, then the constants 
become to Co = 13.6 and C\ = 32.5. 

4 Optimal-dual-based iq-analysis and an Iterative Algorithm 

One of the applications of the general-dual-based ^-analysis and its error bound analysis is in the 
optimal-dual-based £i-analysis approach as we briefly discussed in the introduction. Recall that our 
goal is to solve a constrained optimization problem of this forrn_ : 

f= argmin ||D*f + Pg||i s.t. ||y — <frf H2 < e. (50) 

feR™, geR d 

It is well known that this problem is difficult to solve numerically since the t\ term involved in (|50|) is 
nonsmooth and nonseparable. In this section, we focus on applying the split Bregman iteration [23] 
and develop an iterative algorithm for solving the optimal-dual-based £i-analysis problem. Since 
our derivation of this algorithm makes use of the Bregman iteration, we include an outline of the 
basics of this technique in Appendix lAl 

4.1 Optimal-dual-based ^-analysis via Split Bregman Iteration 

The goal of the split Bregman method is to extend the utility of the Bregman iteration to the 

minimization of problems involving multiple £i-regularization terms [23] and £i-analysis [7j. Here, 

we apply the split Bregman iteration to solve the optimal-dual-based £i-analysis problem (|50p . 

The basic idea is to introduce an intermediate variable d such that d = D*f + Pg, and the term 

||D*f + Pg||i in (|50p is separable and easy to minimize. 

To solve (|50p . one can use the Bregman iteration (|79|) for the equality constrained version of 
For simplicity of notations, we replace f by f in this section. 
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(|50|) with an early stopping criterion 

||*f fc -y|| 2 <e (51) 

to find a good approximate solution of (J5U|). This approach has already been used and discussed in, 
for example, [7J, [32], [S]- The equality constrained version of (j50|) is given by 

f= argmin ||D*f + Pg||i s.t. *f = y. (52) 
feR™, geM d 



Apply the Bregman iteration (|79p to the constrained minimization problem f)52f) . we obtain 
(f fe+1 ,g* +1 ) = argmin f _||D*f + Pg||i + £ |l*f - y + c fc |||, 

(53) 

c fc + l = c fc + ^ffe+l _ 

for = 0, 1, ... , starting with c° = 0, g° = 0, and f° = 0. In the first step, we have to solve a 
subproblem of this form 

min||D*f + Pg||i + ^||*f - y + c k \\\. (54) 
f, g 2 

This problem is equivalent to 

min ||d||i + ^||$f-y + c fe ||l s.t. d = D*f + Pg. (55) 

f, g, d 2 

Again, apply the Bregman iteration (j79j) to ()55|) . we have the following two-phase algorithm for 
solving the subproblem (f5~4"|) 

( f k+i d *+i g fe+i) = argminf [| ci|| x + #||*f - y + c fc ||| + §||D*f + Pg - d + b fc ||2, 

' " (56) 

Jjfc + l — Igfc _|_ /j}*ffc + l _|_ pgfc+l _ d fc + 1 ). 

Since we have split the l\ and £2 components of the subproblem involved in (|56|) , we can perform 
this minimization efficiently by iteratively minimizing with respect to f , d, and g separately. Thus 
we arrive at the following three steps: 

Step 1 : f fc+1 = argmin f | ||*f - y + c k \\ 2 2 + ~||D*f + Pg fc - d fe + h k \g (57) 
Step 2 : d fe+1 = argmin d ||d||i + -||d - D*f fc+1 - Pg fe - b k \\l, (58) 



2 

A 
: 2 



Step 3 : g fc+1 = argmin g -||Pg + D*f fe+1 - d fc+1 + b k \\ 2 2 . (59) 
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In Step 1, because we have decoupled f from the t\ portion of the problem, the optimization 
problem is now differentiable. The optimality conditions to (|57p yield 



/x**(*f - y + c fc ) + AD(D*f + Pg fc - d k + b k ) = 0. 



(60) 



Thus we can compute 



f fc+1 = + ADD*) _i [/i**(y - cfc ) + ^D(d fe - Pg fe - b 



k i fc\ 



(61) 



In Step 2, there is no coupling between elements of d. This problem can be solved by a simple 
soft shrinkage, i.e., 

d k+1 = shrink(D*f fc+1 + Pg fc + b k , 1/A), (62) 
where the soft shrinkage operator is defined as 

shrink(wj, 1/A) = sign(wj) • max(|wj| — 1/A, 0). 



In Step 3, the optimality conditions to (|59|) lead to 



AP(Pg + D*f fc+1 - d fc+1 + b k ) = 



(63) 



Since only Pg fc is involved in the update of f fc , d fc , and h k , it is enough to derive an updating 
formula for Pg fc 

* k+1 ■= P(d fe+1 - D*f fe+1 - b fc ). (64) 



Pg K 



Therefore, we obtain the unconstrained split Bregman algorithm for solving the subproblem ([5 
as follows: 

for n = 1 to N 

f fc+l = (^*$ + ADD^^^^y - c k ) + XD{d new - Pg new - b k )], 
d k+i _ s hrink(D*f neM ' + Pg" OT + b k , 1/A), 

Pgfe+l _ p^^jnew) Yy*£new fo^) 

end 

b fc+i _ b fc + (p*fk+l + p g fc+i _ d fc+i) 5 



(65) 
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where (•) neM ' denotes either if it is available or (-) fc otherwise. 

Ideally, we need to run infinite iterations (N — > oo) to obtain a convergent solution for the 
subproblem involved in (|56p . However, as pointed out in [23J, it is not desirable to solve this 
subproblem to full convergence. Intuitively, the reason for this is that if the error in our solution for 
this subproblem is small compared to ||b fe — b"||2, where is the "true b", then this extra precision 
will be "wasted" when the Bregman parameter is updated. In fact, it was found empirically in [23] 
that for many applications optimal efficiency is obtained when only one iteration of the inner loop 
is performed (i.e., TV = 1 in (|65p ). When N = 1, the unconstrained split Bregman iteration 
reduces to 



(66) 



f fc+i = (^$*$ + ADD*)- 1 [/i**(y-c A; ) + AD(d fc -Pg fc -b fe )], 
d k+i = s hrink(D*f fc+1 + Pg fc +b fc ,l/A), 

PgA:+l _ p^fjfc+l _ Y)*fk+1 _ jjfcj 

b fc+1 = h k + (D*f fc+1 + Pg fc+1 - d fc+1 ). 
Combining this inner solver with the outer iteration (|53h . we obtain the constrained split Breg- 
man method for (1521) as follows: 



for n = 1 to nlnner 

f fe+i = (^$*$ + ADD*)~V**(y - cfe ) + AD(d ne ™ - Pg new - b new )], 
d k+i = shrin k(D*f neu ' + Pg new + b new , 1/A), 

Pgfc+1 _ p^jjneu) Yy*^new j^neio^ (67) 

_ j^neui _|_ ^£y*^new _|_ pgneui jneui^ 

end 

c fc+l =c k + (*f*+l _ y) j 

where nlnner denotes the number of inner loops. A formal statement of the split Bregman iteration 
for optimal-dual-based £i-analysis is given in Algorithm 1 in which f denotes the recovered signal 
and d is the recovered coefficient vector. 
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Algorithm 1: Split Bregman Iteration for optimal-dual-based ^-analysis 
Initialization: f° = 0, d° = b° = Pg° = 0, c° = 0, \i > 0, A > 0, nOuter, nInner,tol; 
while k < nOuter and ||$f fc — y||2 > tol do 
for n = 1 : nlnner do 

f fc+i = ^$>*<s> + XD'D*)- 1 \/i9*(y - c k ) + XY)(d new - Pg new - b new )}; 
d k+1 = shrink(D*f neu; + Pg new + b new , 1/A); 

_ ^new _|_ ^Yy*^new _|_ pgneui ^new^. 

end 

f.k+1 =( .k + ($fk+l _ y ). 

Increase k; 
end 

Remark 6: If D is a Parseval frame and Pg = 0, then Algorithm Q] reduces to the split Bregman 
iteration for the standard ^i-analysis approach as discussed in [7]. 

4.2 Computational Complexity Analysis 

We discuss briefly the computational complexity of Algorithm Q] in this subsection. For simplicity of 
the discussion, we assume that D is a Parseval frame. This stems from the fact that Parseval frames 
are often favored in practical situations. Let Q = + AI„) _1 . Define C<j>, Cr>, and Cq to be the 

complexity of applying $ or <&*, D or D*, and Q to a vector, respectively. The complexity of the 
first step in the inner loop is Cq + C<j> + Cd- Here the cost of vector operations is omitted since most 
of the work is in matrix-vector products for large-scale problems. Steps 2 and 3 in the inner loop 
require the application of D or D* one and two times respectively (the matrix-vector multiplication 
Y)*fnew f rom the d k update can be reused). The last step in the inner loop only involves vector 
operations. Hence, the total complexity of a single inner loop is Cq + C<j> + 4Cd- Furthermore, the 
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total cost for an outer iteration is nlnner x (Cq + C<j> + 4Cd ) + C<^ . 

The calculations above are in some sense overly pessimistic. In compressed sensing applications, 
one often encounters a matrix $ as a submatrix of a unitary transform, which admits for easy storage 
and fast multiplication. Important examples include the partial Discrete Fourier Transform (DFT). 
By applying the matrix inversion lemma, it is not hard to show that Q = j (l n — Thus 
computing f k+1 in the inner loop is cheap since no matrix inversion is required. In this case, the total 
costs for a single inner loop and an outer iteration become 2C<j>+4Cd and nlnner x (2C<£+4Cd)+C<£, 
respectively. Another important example in compressed sensing is when $ is a random matrix. It 
is well known that in this case the eigenvalues of are well clustered. Then applying Q = 

+ AI n ) _1 to a vector can be computed very efficiently via a few conjugate gradient (CG) 
steps [2]. 

As discussed earlier, if Pg = 0, then Algorithm [T] reduces to the split Bregman iteration for 
the standard ^i-analysis approach. Evidently, the corresponding complexity for a single inner loop 
reduces to Cq + C<j> + 2Cd (step 3 disappears in this case). This means that the cost for an inner 
loop decreases by 2Cd- It should be pointed out that, in practical applications, there is often a 
fast algorithm for applying D and D*, e.g., a fast wavelet transform or a fast short-time Fourier 
transform [30], which makes applying of D and D* low-cost. 

5 Numerical Results 

In this section, we present some numerical experiments illustrating the effectiveness of signal recovery 
via the optimal-dual-based £i-analysis approach. Our results confirm that when signals are sparse 
with respect to redundant frames, the optimal-dual-based ^i-analysis approach often achieves better 
recovery performance than the standard ^i-analysis method, and that this recovery is robust with 
respect to noise. 

In these experiments, we use two types of frames: Gabor frames and a concatenation of the 
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coordinate and Fourier bases. The optimal-dual-based ^-analysis problems are solved by Algorithm 
1, while the £i-analysis problems are by Algorithm 1 with Pg = 0. The sensing matrix <I> is 
a Gaussian matrix with m = 32, n = 128. The noise z has a white Gaussian distribution with 
zero- mean and second-order moments <r 2 I m . 

Example 1: Gabor Frames. Recall that for a window function g and positive time- frequency 
shift parameters a and /3, the Gabor frame is given by 

{ 9l Jt) = g(t - ka)e 2 ^} l>k . (68) 

For many imaging systems such as radar and sonar, the received signal / often has the form 

s 

f{t) = Y j a k g{t-t k )e i ^ t . (69) 
fc=i 

Evidently, if s is small, / is sparse with respect to some Gabor frame. In this experiment, we 
construct a Gabor dictionary with Gaussian windows, oversampled by a factor of 20 so that d = 
20 xn = 2560. The tested signal f is sparse with respect to the constructed Gabor frame with sparsity 
s = ceil(0.2 x m) = 7. The positions of the nonzero entries of the coefficient vector x are selected 
uniformly at random, and each nonzero value is sampled from standard Gaussian distribution. We 
set A = [J, = 1, tol = 10 -6 , and nOuter = 200 in Algorithm [TJ 

7 igure Q] shows the relative error vs. outer iteration number for both approaches in noiseless 
casqj. It is not hard to see that the optimal-dual-based ^i-analysis approach is more effective than 
the standard £i-analysis approach. This is because the optimization of the former is not only over 
the signal space but also over all dual frames of D. In other words, there exists some optimal dual 
frame D Q which produces sparser coefficients than the canonical dual frame does for the tested 
signal. Since D Q is also a dual frame, it then follows from ([7]) that a better recovery performance 
can be achieved by the optimal-dual-based ^i-analysis approach. 



2 The problem of the same setting is tested many times with randomly generated examples (as detailed). These test 
results are similar to that of Figure [T] To facilitate the explanation, we only show the result for one random instance. 
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Gabor Frame Case: nlnner=10 
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Gabor Frame Case: nlnner=30 
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Figure 1: Relative error vs. outer iteration number (without noise). The relative error at iteration k is 
defined as ||f — f fe ||2/||f H2, where f fc is the approximation at iteration k and f is the true solution. The 
optimal-dual-bascd ^i-analysis problems are solved by Algorithm 1, while the £i-analysis problems are by 
Algorithm 1 with Pg = 0. Left: Results for nlnner =10. Right: Results for nlnner = 30. 

The convergence performance of Algorithm [T] can also be observed in Figure [TJ The proposed 
algorithm converges quickly for the first several iterations, but then slows down as the true solution 
is near. It is also evident that as nlnner increases, the proposed algorithm requires less outer 
iterations to converge. This is because the subproblem involved in (|53p is solved more accurately 
as nlnner increases, the need for outer Bregman updates is naturally less in order to reach the 
steady state. It is worth noting that as nlnner increases, the corresponding complexity for an outer 
iteration also increases. 

Our next simulation is to show the robustness of the optimal-dual-based £i-analysis with respect 
to noise in the measurements. Figure [2] shows the recovery error as a function of the noise level. 
As expected, the relation is linear. We also see that the constant Co in Theorem [1] for the optimal- 
dual-based £i-analysis is larger than that for the standard £i-analysis. But the overall performance 
of the optimal-dual-based method is still much better. 

We also test the performance of the optimal-dual-based i'l-analysis with respect to the sparsity 
level of the coefficient vector x. Figure [3] shows that the optimal-dual-based £i-analysis outperforms 
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Gabor Frame Case 




Figure 2: Relative recovery error vs. relative noise level, averaged over 5 trials. The relative recovery 
error is denned as ||f — f ||2/||f H2 and the relative noise level is defined as y/ma/W&i H2. The sparsity level is 
s = ccil(0.2 x m) = 7. Set A = fi = 1, tol = 1(T 6 , nlnner = 30, and nOuter = 200 in Algorithm [U 

the standard l\ -analysis at different sparsity levels. The plot also shows that the performance curve 
of the optimal-dual-based ^i-analysis exhibits a threshold effect. When g = s/m < 0.2, the optimal- 
dual-based ^i-analysis recovers the signal accurately. When g > 0.2, the performance degrades as g 
increases. 

Example 2: Concatenations. In many applications, signals of interest are sparse over several 
orthonormal bases (or frames), it is natural to use a dictionary D consisting of a concatenation of 
these bases (or frames). In this experiment, we consider a dictionary consisting of the coordinate and 
Fourier bases, i.e., D = [I, F]. The tested signal f is a linear combination of spikes and sinusoids with 
sparsity s = ceil(0.2 x m) = 7. The positions of the nonzero entries of x are selected uniformly at 
random, and each nonzero value is sampled from standard Gaussian distribution. We set A = \i = 1, 
tol = 10~ 12 and nOuter = 100 in Algorithm [lj 

Figure [5] shows that the optimal-dual-based ^i-analysis approach achieves much better recovery 
performance than that of the standard ^-analysis approach. The latter fundamentally fails with 
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Gabor Frame Case 
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Figure 3: Relative recovery error vs. relative sparsity level of x, averaged over 100 trials. The relative 
recovery error is defined as ||f — f||2/||f||2 and the relative sparsity level g is defined as g = s/m. No noise 
a 2 = 0. The parameters for Algorithms [T] are the same as in Figure [2j 

a relative error at about 80%. Such a failure is not surprising since D*f in this case is not at all 
sparse. This is due to the fact that, in this very example, the component that is sparse in one basis 
is not at all in the other. 

Figures [5] and [6] show the performance of the optimal-dual-based £i-analysis with respect to the 
noise level and the sparsity level for the I + F case, respectively. The results are similar to that for 
the Gabor frame case. We also see that the standard ^-analysis fails at all noise levels and sparsity 
levels in this case. 

6 Conclusions 

We extend the ^i-analysis approach to a more general case in which the analysis operator can 
be any dual frame of D. We call it the general-dual-based approach. Error performance bound 
is established. Improved sufficient signal recovery conditions are provided. To demonstrate the 
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I + F Case: n!nner=5 



I + F Case: nlnner=15 
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Figure 4: Relative error vs. outer iteration number (without noise). The relative error at iteration k is 
defined as ||f — f fe ||2/||f H2, where f fe is the approximation at iteration k and f is the true solution. The 
optimal-dual-based ^-analysis problems are solved by Algorithm 1, while the £i-analysis problems are by 
Algorithm 1 with Pg = 0. Left: Results for nlnner = 5. Right: Results for nlnner =15. 




Figure 5: Relative recovery error vs. relative noise level, averaged over 5 trials. The relative recovery 
error is defined as ||f — f ||2/||f H2 and the relative noise level is defined as y/ma/W&f ||2- The sparsity level is 
s = ceil(0.2 x m) — 7. Set A = fi = 1, tol = 1Q~ 12 , nlnner = 15, and nOuter = 100 in Algorithm[TJ 
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I + F Case 




Figure 6: Relative recovery error vs. relative sparsity level of x, averaged over 100 trials. The relative 
recovery error is defined as ||f — f||2/||f||2 and the relative sparsity level g is defined as g = s/m. No noise 
a 2 = 0. The parameters for Algorithms Q] are the same as in Figure [5l 

effectiveness of the general-dual-based approach, we also propose an optimal-dual-based £i-analysis 
approach to recover the signal directly. The optimization of this method is not only over the 
signal space but also over all dual frames of D. We have seen that when signals are sparse with 
respect to frames that are redundant and coherent, this optimal-dual-based approach often achieves 
better recovery performance than that of the standard £i-analysis. By applying the split Bregman 
iteration, we develop an iterative algorithm for solving the optimal-dual-based ^i-analysis problem. 
The proposed algorithm is very fast when proper parameter values are used and easy to code. 
Our ongoing work includes the performance analysis of the £i-synthesis approach by virtue of the 
principle of the optimal-dual-based -^-analysis approach we proposed, and further refinements of 
the algorithm. 
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A The Basics of the Bregman Iteration 

The Bregman iteration is a technique that originated in functional analysis for finding extrema of 
convex functionals [4j . The Bregman iteration was first introduced to image processing in [32] , where 
it was applied to total variation (TV) denoising. Then, in [6], [34], it was shown to be remarkably 
successful for l\ minimization problems in compressed sensing. Here we briefly review this technique. 
More details about the Bregman iteration can be found in e.g., [6], [TJ, [32], [34] . 

The Bregman iteration relies on the concept of the Bregman distance [1] . The Bregman distance 
of a convex function J(u) between points u and v is defined as 

BP(u,v) = J(u)-J(v)-(u-v,p), (70) 

where p 6 dJ(v) is some subgradient in the sub differential of J at the point of v. Clearly, Bj(u, v) 
is not a distance in the usual sense, since Bj(u, v) ^ Bj(v,u) in general. However, it does measure 
the closeness between u and v in the sense that Bj(u, v) > and Bj(u,v) > Bj(w,v) for all 
points w on the line segment connecting u and v. 

First, consider the following unconstrained optimization problem 

min J(u) + H(u), (71) 

u 

where J(u) is some convex function and H(u) is some convex and differentiable function with 
argmin u ff(u) = 0. 

Instead of directly solving (|7ip . the Bregman iteration iteratively solves 

u fe+1 = argmin ^^(u,^) + H(u), 

u 

= argmin J(u) - J(u fc ) - (u - u fc , p k ) + H(u), (72) 

u 

for k = 0, 1, . . . , starting from u° = and p° = 0. In (|72p . the updating formula for p k is based 

k 

on the optimality conditions of (|72l) . Since u fc+1 minimizes (|72[) . then G d{Bj (u, u fc ) + H(u)}, 
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where this subdifferential is evaluated at u fc+1 , i.e., 

G dJ(u k+1 ) -p k + VH(u k+1 ). 

This leads to 

p k+i = p k _ Vjff(u fc+i) G dJ{u k+1 ). (73) 
Combining (j72j) and (|73p yields the Bregman iteration: 

u fc+1 = argmin.-i?? (u, u fe ) + H(u), 

(74) 

p fc+i _ p t_vff( u W), 
for fe = 0, 1, . . . , starting with u° = and p° = 0. 

The convergence of the Bregman iteration (I74p was analyzed in |32j . In particular, it was shown 
that, under fairly weak assumptions on J(u) and H(u), H(u k ) — > as k — > oo. 

We then show that the Bregman iteration can also be used to solve the general constrained 
convex minimization problem: 

min J(u) s.t. 3>u = y, (75) 

u 

where J(u) denotes some convex function and $ is some linear operator. 

Traditionally, this problem may be solved by a continuation method, where we solve sequentially 
the unconstrained problems 

min J( u ) + ^||*u-y|||, (76) 

where Ai < A2 < ■ ■ ■ < \k is an increasing sequence of penalty function weights [3j. In order to 
enforce that <l>u ~ y, we must choose \k to be extremely large. However, choosing a large value 
for Afc may make (|76p extremely difficult to solve numerically [23J. 

The Bregman iteration provides another way to transfer the constrained problem (|75h into a series 
of unconstrained problems. To this end, we first convert (|75p into an unconstrained optimization 
problem using a quadratic penalty function: 

min J(u) + ^||*u-y|||. (77) 
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Then we apply the Bregman iteration (|74p and iteratively minimize: 



u 



k+1 = argmin u l?^ ( u > uk ) + |ll* u - y|li> 



(78) 



p fc+i _ p fc _ A**(*u fc+1 - y), 

for fc = 0, 1, . . . , starting with u° = and p° = 0. 

By change of variable, this seemingly complicated iteration (|78p can be reformulated into a 
simplified form [TJ: 

u k+1 = argmin,, J(u) + 4||«&u — y + t> fc 1 1 § , 

(79) 



u k+1 = argmin u J(u) + w[|*u — y + t» fc 1 1 § , 
h k+i =h k + ($ u fc+i _ y ) ; 

for k = 0, 1, . . . , starting with b° = and u° = 0. 



Indeed, by p° = and induction on p fc , we obtain p k = — A<&* X]j=i(* u ' J ~~ y)- Substituting 
this into the first step of (|78p yields 



Bf(u, u k ) + -||*u - y||l = J(u) - J(u fc ) - (u - u k ,p k ) + -||d>u - y| 

= J(u)-(u,p fe ) + ^||*u-y||2+C 2 

A, 



J(u) + A(*u, £(W - y)) + -||*u - y||l + C : 



3=1 



$u - y + ^(*u J - y; 

i=i 



+ C 3 



where C2 and C3 are independent of u. By the definition of u fc+1 in flTgp, we have that 



u fe+1 = argmin J(u) + — 



A 



Define b fc = X^=i(* ufc ~~ y)' then we have 



*u - y + ^(*u J - y) 

3=1 



(80) 



(81) 



b fc+1 =b fc + (*u fc+1 -y), b° = 0. 



(82) 



With this, (j8Tj) becomes 



u 



fc+i 



A 



argmin J(u) + — ||*&u — y + t» fc 1 1 § 



(83) 
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Combining (|83|) and (|82|) yields (|79p . It is this form (|79|) that will be used to derive the split Bregman 
iteration. 

The convergence results of the Bregman iteration ([78]) (or (|79p) were given in [23], [33]. It was 
shown that the sequence u k generated by ([78]) (or ([79]) ) weakly converges to a solution of ([731) . 
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